Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add filters

Main subject
Language
Document Type
Year range
1.
medrxiv; 2022.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2022.11.14.22282297

ABSTRACT

The continuing emergence of SARS-CoV-2 variants of concern (VOCs) presents a serious public health threat, exacerbating the effects of the COVID19 pandemic. Although millions of genomes have been deposited in public archives since the start of the pandemic, predicting SARS-CoV-2 clinical characteristics from the genome sequence remains challenging. In this study, we used a collection of over 29,000 high quality SARS-CoV-2 genomes to build machine learning models for predicting clinical detection cycle threshold (Ct) values, which correspond with viral load. After evaluating several machine learning methods and parameters, our best model was a random forest regressor that used 10-mer oligonucleotides as features and achieved an R2 score of 0.521 +/- 0.010 (95% confidence interval over 5 folds) and an RMSE of 5.7 +/- 0.034, demonstrating the ability of the models to detect the presence of a signal in the genomic data. In an attempt to predict Ct values for newly emerging variants, we predicted Ct values for Omicron variants using models trained on previous variants. We found that approximately 5% of the data in the model needed to be from the new variant in order to learn its Ct values. Finally, to understand how the model is working, we evaluated the top features and found that the model is using a multitude of k-mers from across the genome to make the predictions. However, when we looked at the top k-mers that occurred most frequently across the set of genomes, we observed a clustering of k-mers that span spike protein regions corresponding with key variations that are hallmarks of the VOCs including G339, K417, L452, N501, and P681, indicating that these sites are informative in the model and may impact the Ct values that are observed in clinical samples.


Subject(s)
COVID-19
2.
medrxiv; 2022.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2022.08.17.22278898

ABSTRACT

The COVID-19 pandemic has resulted in extensive surveillance of the genomic diversity of SARS-CoV-2. Sequencing data generated as part of these efforts can also capture the diversity of the SARS-CoV-2 virus populations replicating within infected individuals. To assess this within-host diversity of SARS-CoV-2 we quantified low frequency (minor) variants from deep sequence data of thousands of clinical samples collected by a large urban hospital system over the course of a year. Using a robust analytical pipeline to control for technical artefacts, we observe that at comparable viral loads, specimens from patients hospitalized due to COVID-19 had a greater number of minor variants than samples from outpatients. Since individuals with highly diverse viral populations could be disproportionate drivers of new viral lineages in the patient population, these results suggest that transmission control should pay special attention to patients with severe or protracted disease to prevent the spread of novel variants.


Subject(s)
COVID-19
3.
biorxiv; 2021.
Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2021.09.27.461949

ABSTRACT

The ARTIC Network provides a common resource of PCR primer sequences and recommendations for amplifying SARS-CoV-2 genomes. The initial tiling strategy was developed with the reference genome Wuhan-01, and subsequent iterations have addressed areas of low amplification and sequence drop out. Recently, a new version (V4) was released, based on new variant genome sequences, in response to the realization that some V3 primers were located in regions with key mutations. Herein, we compare the performance of the ARTIC V3 and V4 primer sets with a matched set of 663 SARS-CoV-2 clinical samples sequenced with an Illumina NovaSeq 6000 instrument. We observe general improvements in sequencing depth and quality, and improved resolution of the SNP causing the D950N variation in the spike protein. Importantly, we also find nearly universal presence of spike protein substitution G142D in Delta-lineage samples. Due to the prior release and widespread use of the ARTIC V3 primers during the initial surge of the Delta variant, it is likely that the G142D amino acid substitution is substantially underrepresented among early Delta variant genomes deposited in public repositories. In addition to the improved performance of the ARTIC V4 primer set, this study also illustrates the importance of the primer scheme in downstream analyses. ImportanceARTIC Network primers are commonly used by laboratories worldwide to amplify and sequence SARS-CoV-2 present in clinical samples. As new variants have evolved and spread, it was found that the V3 primer set poorly amplified several key mutations. In this report, we compare the results of sequencing a matched set of samples with the V3 and V4 primer sets. We find that adoption of the ARTIC V4 primer set is critical for accurate sequencing of the SARS-CoV-2 spike region. The absence of metadata describing the primer scheme used will negatively impact the downstream use of publicly available SARS-Cov-2 sequencing reads and assembled genomes.

4.
medrxiv; 2021.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2021.07.19.21260808

ABSTRACT

Genetic variants of SARS-CoV-2 have repeatedly altered the course of the COVID-19 pandemic, and disease in individual patients. Delta variants (B.1.617.2, AY.2, and AY.3) are now the focus of international concern because they are causing widespread COVID-19 disease globally. Vaccine breakthrough cases caused by SARS-CoV-2 variants also are of considerable public health and medical concern worldwide. As part of a comprehensive project, we sequenced the genomes of 3,913 SARS-CoV-2 from patient samples acquired March 15, 2021 through July 3, 2021 in the Houston Methodist hospital system and studied vaccine breakthrough cases. During the study period Delta variants increased to cause 58% of all COVID-19 cases and spread throughout the metropolitan Houston area. In addition, Delta variants caused a significantly higher rate of vaccine breakthrough cases (19.7% compared to 5.8% for all other variants). Importantly, only 6.5% of all COVID-19 cases occurred in fully immunized individuals, and relatively few of these patients required hospitalization. Our genomic and epidemiologic data emphasize that vaccines used in the United States are highly effective in decreasing severe COVID-19 disease, hospitalizations, and deaths.


Subject(s)
COVID-19
5.
medrxiv; 2021.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2021.05.20.21257552

ABSTRACT

Genetic variants of the SARS-CoV-2 virus are of substantial concern because they can detrimentally alter the pandemic course and disease features in individual patients. Here we report SARS-CoV-2 genome sequences from 12,476 patients in the Houston Methodist healthcare system diagnosed from January 1, 2021 through May 31, 2021. The SARS-CoV-2 variant designated U.K. B.1.1.7 increased rapidly and caused 63%-90% of all new cases in the Houston area in the latter half of May. Eleven of the 3,276 B.1.1.7 genomes had an E484K change in spike protein. Compared with non-B.1.1.7 patients, individuals with B.1.1.7 had a significantly lower cycle threshold value (a proxy for higher virus load) and significantly higher rate of hospitalization. Other variants (e.g., B.1.429, B.1.427, P.1, P.2, and R.1) also increased rapidly, although the magnitude was less than for B.1.1.7. We identified 22 patients infected with B.1.617 "India" variants; these patients had a high rate of hospitalization. Vaccine breakthrough cases (n=207) were caused by a heterogeneous array of virus genotypes, including many that are not variants of interest or concern. In the aggregate, our study delineates the trajectory of concerning SARS-CoV-2 variants circulating in a major metropolitan area, documents B.1.1.7 as the major cause of new cases in Houston, and heralds the arrival and spread of B.1.617 variants in the metroplex.

6.
medrxiv; 2021.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2021.02.26.21252227

ABSTRACT

[Abstract]Since the beginning of the SARS-CoV-2 pandemic, there has been international concern about the emergence of virus variants with mutations that increase transmissibility, enhance escape from the human immune response, or otherwise alter biologically important phenotypes. In late 2020, several "variants of concern" emerged globally, including the UK variant (B.1.1.7), South Africa variant (B.1.351), Brazil variants (P.1 and P.2), and two related California "variants of interest" (B.1.429 and B.1.427). These variants are believed to have enhanced transmissibility capacity. For the South Africa and Brazil variants, there is evidence that mutations in spike protein permit it to escape from some vaccines and therapeutic monoclonal antibodies. Based on our extensive genome sequencing program involving 20,453 virus specimens from COVID-19 patients dating from March 2020, we report identification of all important SARS-CoV-2 variants among Houston Methodist Hospital patients residing in the greater metropolitan area. Although these variants are currently at relatively low frequency in the population, they are geographically widespread. Houston is the first city in the United States to have all variants documented by genome sequencing. As vaccine deployment accelerates worldwide, increased genomic surveillance of SARS-CoV-2 is essential to understanding the presence and frequency of consequential variants and their patterns and trajectory of dissemination. This information is critical for medical and public health efforts to effectively address and mitigate this global crisis.


Subject(s)
COVID-19
7.
medrxiv; 2020.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2020.09.22.20199125

ABSTRACT

We sequenced the genomes of 5,085 SARS-CoV-2 strains causing two COVID-19 disease waves in metropolitan Houston, Texas, an ethnically diverse region with seven million residents. The genomes were from viruses recovered in the earliest recognized phase of the pandemic in Houston, and an ongoing massive second wave of infections. The virus was originally introduced into Houston many times independently. Virtually all strains in the second wave have a Gly614 amino acid replacement in the spike protein, a polymorphism that has been linked to increased transmission and infectivity. Patients infected with the Gly614 variant strains had significantly higher virus loads in the nasopharynx on initial diagnosis. We found little evidence of a significant relationship between virus genotypes and altered virulence, stressing the linkage between disease severity, underlying medical conditions, and host genetics. Some regions of the spike protein - the primary target of global vaccine efforts - are replete with amino acid replacements, perhaps indicating the action of selection. We exploited the genomic data to generate defined single amino acid replacements in the receptor binding domain of spike protein that, importantly, produced decreased recognition by the neutralizing monoclonal antibody CR30022. Our study is the first analysis of the molecular architecture of SARS-CoV-2 in two infection waves in a major metropolitan region. The findings will help us to understand the origin, composition, and trajectory of future infection waves, and the potential effect of the host immune response and therapeutic maneuvers on SARS-CoV-2 evolution. IMPORTANCEThere is concern about second and subsequent waves of COVID-19 caused by the SARS-CoV-2 coronavirus occurring in communities globally that had an initial disease wave. Metropolitan Houston, Texas, with a population of 7 million, is experiencing a massive second disease wave that began in late May 2020. To understand SARS-CoV-2 molecular population genomic architecture, evolution, and relationship between virus genotypes and patient features, we sequenced the genomes of 5,085 SARS-CoV-2 strains from these two waves. Our study provides the first molecular characterization of SARS-CoV-2 strains causing two distinct COVID-19 disease waves.


Subject(s)
COVID-19
SELECTION OF CITATIONS
SEARCH DETAIL